An Ensemble Outlier Detection Method Based on Information Entropy-Weighted Subspaces for High-Dimensional Data

نویسندگان

چکیده

Outlier detection is an important task in the field of data mining and a highly active area research machine learning. In industrial automation, datasets are often high-dimensional, meaning effort to study all dimensions directly leads sparsity, thus causing outliers be masked by noise effects high-dimensional spaces. The “curse dimensionality” phenomenon renders many conventional outlier methods ineffective. This paper proposes new algorithm called EOEH (Ensemble Detection Method Based on Information Entropy-Weighted Subspaces for High-Dimensional Data). First, random secondary subsampling performed data, detectors run various small-scale sub-samples provide diverse results. Results then aggregated reduce global variance enhance robustness algorithm. Subsequently, information entropy utilized construct dimension-space weighting method that can discern influential factors within different dimensional generates weighted subspaces objects, reducing impact created improving performance. Finally, this offers design high-precision local factor (HPLOF) detector amplifies differentiation between normal thereby performance feasibility validated through experiments used both simulated UCI datasets. comparison popular algorithms, our demonstrates superior runtime efficiency. Compared with current popular, common improves 6% average. terms running time 20% faster than algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data

We propose an original outlier detection schema that detects outliers in varying subspaces of a high dimensional feature space. In particular, for each object in the data set, we explore the axis-parallel subspace spanned by its neighbors and determine how much the object deviates from the neighbors in this subspace. In our experiments, we show that our novel subspace outlier detection is super...

متن کامل

Outlier detection for high dimensional data pdf

Is particularly useful for high dimensional data where outliers cannot be found.High dimensional data in Euclidean space pose special challenges to data. In about just the last few years, the task of unsupervised outlier detection has found.Outlier detection is an outstanding data mining task referred to open pdf with mac word class="text" href="https://tokiqivy.files.wordpress.com/2015/06/opel...

متن کامل

Outlier detection for high-dimensional data

Outlier detection is an integral component of statistical modelling and estimation. For highdimensional data, classical methods based on the Mahalanobis distance are usually not applicable. We propose an outlier detection procedure that replaces the classical minimum covariance determinant estimator with a high-breakdown minimum diagonal product estimator. The cut-off value is obtained from the...

متن کامل

Disk-Based Sampling for Outlier Detection in High Dimensional Data

We propose an efficient sampling based outlier detection method for large high-dimensional data. Our method consists of two phases. In the first phase, we combine a “sampling” strategy with a simple randomized partitioning technique to generate a candidate set of outliers. This phase requires one full data scan and the running time has linear complexity with respect to the size and dimensionali...

متن کامل

Fast target detection method for high-resolution SAR images based on variance weighted information entropy

Since the traditional CFAR algorithm is not suitable for high-resolution target detection of synthetic aperture radar (SAR) images, a new two-stage target detection method based on variance weighted information entropy is proposed in this paper. On the first stage, the regions of interest (ROIs) in SAR image is extracted based on the variance weighted information entropy (WIE), which has been p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Entropy

سال: 2023

ISSN: ['1099-4300']

DOI: https://doi.org/10.3390/e25081185